COVID-19 risk score as a public health tool to guide targeted testing: A demonstration study in Qatar

We developed a Coronavirus Disease 2019 (COVID-19) risk score to guide targeted RT-PCR testing in Qatar. The Qatar national COVID-19 testing database, encompassing a total of 2,688,232 RT-PCR tests conducted between February 5, 2020-January 27, 2021, was analyzed. Logistic regression analyses were implemented to derive the COVID-19 risk score, as a tool to identify those at highest risk of having the infection. Score cut-off was determined using the ROC curve based on maximum sum of sensitivity and specificity. The score’s performance diagnostics were assessed. Logistic regression analysis identified age, sex, and nationality as significant predictors of infection and were included in the risk score. The ROC curve was generated and the area under the curve was estimated at 0.63 (95% CI: 0.63–0.63). The score had a sensitivity of 59.4% (95% CI: 59.1%-59.7%), specificity of 61.1% (95% CI: 61.1%-61.2%), a positive predictive value of 10.9% (95% CI: 10.8%-10.9%), and a negative predictive value of 94.9% (94.9%-95.0%). The concept and utility of a COVID-19 risk score were demonstrated in Qatar. Such a public health tool can have considerable utility in optimizing testing and suppressing infection transmission, while maximizing efficiency and use of available resources.


Introduction
Suppressing the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic necessitates strategic preparedness and response [1]. The World Health Organization (WHO) has urged countries to adopt a "testing, tracing, and isolation" approach as the "backbone" of their SARS-CoV-2 national response [2]. However, to suppress the epidemic, deliver healthcare services to those in need, and ensure optimal use of resources, testing strategies need to be guided by real-time data analysis so that testing is prioritized to those at higher risk of exposure.
A risk score is an objective set of simple questions or measurements that can be used to assess the likelihood of an individual having a specific infection/disease condition [3][4][5][6]. Such scores have been useful in designing initial screening or testing strategies for a variety of diseases, as they reduce the need for more invasive, time-consuming, and expensive testing, while optimizing resource allocation by targeting individuals at higher risk of having the infection/ disease [7]. The utility of developing a risk score for SARS-CoV-2 infection offers the benefits of earlier case detection, isolation of cases, and quarantine of contacts, given the disease burden associated with this infection.
Qatar is a high-income country in the Arabian Gulf with a total population of 2.8 million, the majority of whom (89%) are expatriates from over 150 countries [8][9][10]. The nation's rapid development resulted in a unique socio-demographic structure dominated by men, who comprise 74% of the total population [8], and by younger age cohorts (ages 20-50 years), who likewise comprise 74% of the population [8].
This study had three objectives. The first was to present a derived risk score for SARS-CoV-2 infection that was developed during the first epidemic wave in April, 2020 to inform the national response to the epidemic. The second objective was to assess the prospective performance of this risk score on epidemic data collected after its derivation. The third objective was to update this risk score to end of January, 2021, and to assess its diagnostic metrics for future use as part of the national response.
The overarching goal of this study was to demonstrate the feasibility and utility of the concept of a Coronavirus Disease 2019 (COVID-19) risk score as a public health tool in an emergent epidemic, applying it to a specific country. Building on the public health utility of risk scores for other diseases such as diabetes [3][4][5][6], we believe that this study provides the first COVID-19 risk score for any country. The score has been named "COVID-19 risk score", given the prevailing public use of "COVID-19", as opposed to SARS-CoV-2.

Data source
We analyzed the national database for SARS-CoV-2 real-time polymerase chain reaction (RT-PCR) testing compiled by Hamad Medical Corporation (HMC), the main public healthcare provider in Qatar. The database includes results of all RT-PCR testing conducted in Qatar, regardless of whether it was for suspected SARS-CoV-2 cases, traced contacts, infection surveillance, or other purposes, between February 5, 2020 and January 27, 2021. February 5 is the day on which the first RT-PCR positive patient was diagnosed, a traveler arriving in Qatar [16].
Two risk scores were derived. The "original" Qatar COVID-19 risk score was derived in April 2020, during the expanding phase of the epidemic [11], utilizing half of the RT-PCR tests administered from February 5, 2020 to April 21, 2020. This half of the sample was chosen randomly. Performance of the risk score was subsequently assessed and validated utilizing the remaining half of the sample.
Similarly, an updated version of the Qatar COVID-19 risk score was derived utilizing half of the RT-PCR testing sample compiled from February 5, 2020 to January 27, 2021. Performance of the updated risk score was subsequently assessed and validated utilizing the remaining half of the sample.

Laboratory methods
Nasopharyngeal and/or oropharyngeal swabs (Huachenyang Technology, China) were collected for PCR testing and placed in Universal Transport Medium (UTM). Aliquots of UTM were: extracted on a QIAsymphony platform (QIAGEN, USA) and tested with RT-qPCR using TaqPath™ COVID-19 Combo Kits (100% sensitivity and specificity [25]; Thermo Fisher Scientific, USA) on an ABI 7500 FAST (ThermoFisher, USA); extracted using a custom protocol [26] on a Hamilton Microlab STAR (Hamilton, USA) and tested using AccuPower SARS--CoV-2 Real-Time RT-PCR Kits (100% sensitivity and specificity [27]; Bioneer, Korea) on an ABI 7500 FAST; or loaded directly into a Roche cobas 1 6800 system and assayed with a cobas 1 SARS-CoV-2 Test (95% sensitivity, 100% specificity [28]; Roche, Switzerland). The first assay targets the viral S, N, and ORF1ab regions. The second targets the viral RdRp and Egene regions, and the third targets the ORF1ab and E-gene regions.
All tests were conducted at the HMC Central Laboratory or Sidra Medicine Laboratory, following standardized protocols.

Statistical analysis
Risk score derivation. Bivariable logistic regressions were performed to identify associations between each demographic factor and SARS-CoV-2 status. Multivariable logistic regression was then conducted to identify independent predictors of SARS-CoV-2 RT-PCR positivity and to estimate adjusted odds ratios (aOR) and corresponding 95% confidence intervals (CI). A p-value �0.05 in the multivariable analysis for any predictor was considered to provide strong evidence for an association with the outcome. Predictors with p-values �0.05 were retained in deriving the Qatar COVID-19 risk score.
Each predictor level was assigned scoring points using the corresponding regression model's β-coefficient multiplied by 10 (and rounded to the nearest integer) for ease of implementation, per established methodology [3][4][5][6]29]. An aggregate risk score for each test was then derived by summing the scoring points, given the individual's profile. No interaction terms between covariates were included, so as to keep the score simple and accessible for broad use. The score was used to determine an individual's level of risk of exposure to SARS-CoV-2 infection.

Risk score performance and validation. A receiving operating characteristics (ROC)
curve was plotted to determine the capacity of the risk score to diagnose SARS-CoV-2 infection at different cut-off values resulting in a positive outcome. Sensitivity was defined as the proportion of those with a positive outcome when applying the score among tests with a positive RT-PCR result, that is, the capacity of the score to detect a true SARS-CoV-2 infection. Specificity was defined as the proportion of those with a negative outcome when applying the score among the tests with a negative RT-PCR result, that is, the capacity of the score to detect true absence of SARS-CoV-2 infection.
The optimal score cut-off/criterion to identify infected or uninfected cases was determined by selecting the value that maximized the sum of sensitivity and specificity. The area under the ROC curve (AUC) was also estimated to quantify the accuracy of the risk score, that is, how well the risk score separated infected from uninfected persons.
The risk score derived utilizing half of the sample was applied to the other half of the sample to assess and validate its performance. The risk score's predictive and diagnostic performance was assessed by estimating the sensitivity, specificity, positive predictive value (PPV; probability of being infected given a positive outcome when applying the score), and the negative predictive value (NPV; probability of being uninfected given a negative outcome when applying the score).
Performance assessment of the original risk score on prospective data. The "original" Qatar COVID-19 risk score that was derived from testing data up to April 21, 2020 was applied to all testing data from April 22, 2020 up to January 27, 2021. The diagnostic metrics described above were calculated to assess the performance of the risk score on data collected after its derivation. All analyses were conducted using SPSS version 27.0 (Armonk, NY, USA).
The study was approved by the Hamad Medical Corporation (HMC IRB number MRC-05-011) and Weill Cornell Medicine-Qatar (WCM-Q IRB number 20-00017) Institutional Review Boards with waiver of informed consent. All methods were carried out in accordance with relevant guidelines and regulations.

Characteristics of SARS-CoV-2 RT-PCR testing conducted in Qatar
Between February 5, 2020 and April 21, 2020, there was a total of 69,820 individuals tested for SARS-CoV-2, of which 13,654 (19.6%) had more than one test. A total of 90,027 RT-PCR tests were performed for SARS-CoV-2 infection, and 10,362 were positive for an overall RT-PCR positivity of 11.5% (95% CI: 11.3%-11.7%).
Between February 5, 2020 and January 27, 2021, there was a total of 1,041,022 individuals tested for SARS-CoV-2, of which 406,048 (39.0%) had more than one test. A total of 2,688,232 RT-PCR tests were performed, and 200,646 were positive for an overall RT-PCR positivity of 7.5% (95% CI: 7.4%-7.5%). Characteristics of SARS-CoV-2 RT-PCR testing conducted in Qatar are presented in Table 1.

Original Qatar COVID-19 risk score
Risk score derivation. Bivariable logistic regression of half the testing sample from February 5, 2020-April 21, 2020 identified significant associations between individual variables, age, sex, and nationality, and RT-PCR outcome (Table 2A). All three demographic variables were retained in the multivariable logistic regression and were included in the risk score. Scoring points were lower for females than for males and higher for specific nationalities. The risk score was expressed as a mathematical formula illustrated in Box 1A.
Risk score performance and validation. The ROC curve was generated, and the AUC was estimated at 0.67 (95% CI: 0.66-0.67) (Fig 1A). A score cut-off value of 6.5 maximized the sum of sensitivity and specificity. This indicated that individuals with a risk score �6.5 should be prioritized for RT-PCR testing.

Updated Qatar COVID-19 risk score
Risk score derivation. Bivariable logistic regression of half the testing sample from April 22, 2020-January 27, 2021 identified significant associations between individual variables, age, sex, and nationality and RT-PCR outcome (Table 2B). All three demographic variables were retained in the multivariable logistic regression and were included in the risk score. Scoring points were lower for females than for males and higher for specific nationalities. The risk score was expressed as a mathematical formula illustrated in Box 1B. Risk score performance and validation. The ROC curve was generated, and the AUC was estimated at 0.63 (95% CI: 0.63-0.63) (Fig 1B). A score cut-off value of 5.5 maximized the sum of sensitivity and specificity. This indicated that individuals with risk scores �5.5 should be prioritized for RT-PCR testing.

Discussion
To illustrate the concept and public health value of COVID-19 risk scores, we derived a simple COVID-19 risk score for Qatar, which to our knowledge is the first for any country. The risk score demonstrated relatively strong performance supporting the utility of using such risk scores to inform national testing strategies. A main finding is that the COVID-19 risk score performed similarly to other public health risk scores, such as those for diabetes [6,[29][30][31][32][33][34]. Indeed, this risk score, though simple to implement, demonstrated reasonably high diagnostic accuracy (Fig 1 and Table 3). The original risk, which was derived based on early epidemic data until only April, 2020 proved effective and offered comparable performance to the updated risk score based on all data until the present (Table 3). This further affirms the utility of such scores even when they are derived from a more limited set of testing data during a specific phase of the epidemic.
While our study provided a proof of concept for the use of such scores, implementation of them can be further optimized. We reported a risk score derived over one year. The score's performance could have been improved, with higher diagnostic ability, if different scores were derived in real-time at every phase of the epidemic and their use is updated continuously. It is remarkable that the risk score derived using a year of RT-PCR testing performed well, even though the epidemiology of the infection in Qatar has evolved immensely during this year [11,[14][15][16][17][18][19][20][21][22][23][24]. A month-by-month risk score, derived based on RT-PCR testing of only the previous month, would have better predicted the risk of infection month by month. With the ease of the process of deriving such risk scores, continuous updating of risk scores is feasible even in resource-limited settings, provided there is a minimal digital healthcare system to track RT-PCR testing. A finding of this study is that there is always likely to be considerable variation in the risk of exposure to the infection based on basic demographics (such as age, sex, and nationality). This reflects the underlying dynamics of infection transmission in any country, as those delineated earlier for Qatar [11,[14][15][16][17][18][19][20][21][22][23][24]. Biological factors such as age [35][36][37][38][39][40][41], may also cause variation in susceptibility to the infection or in the likelihood of the infection's being symptomatic, which may affect the likelihood of testing or of a positive test outcome. The most affected subpopulation in Qatar by COVID-19 during the first wave was the craft and manual worker subpopulation living in shared housing accommodations; that is, similar to living in dormitories where they shared common spaces such as bathrooms and kitchen/dining areas [11,14,16,18,20,21]. The contribution of each nationality reflected the association between nationality and occupation, as well as the differences in the social contact structure in Qatar [14,18,20,21]. Social contacts are more prominent within nationality groups who share the same culture, language, and/or national background. A COVID-19 risk score can be seen as a metric that quantifies these variations in any setting, creating an opportunity for more effective public health action that addresses the needs of different segments of the population.
This study has some limitations. The COVID-19 risk score was derived using the national testing database rather than a nationally representative, probability-based survey of the total population of Qatar. Infection levels and patterns among tested individuals may not necessarily reflect actual levels and patterns in the wider population. The score used a small number of demographic variables, but its predictive power might have been enhanced if other variables had been available, such as more socio-demographic indicators.
Because of the central tracking of all PCR testing in Qatar, we were able to conduct this study on a very large sample size, which may not be available in other countries. However, we conducted a sensitivity analysis where the same regression analysis was done on only (random) 25% of the sample size. The analysis yielded similar effect sizes suggesting that sample size should not be a hindrance in applying this concept to other countries and settings (S1 Table). Of note that the impact of each variable is reflected and measured by the score points. For example, Nepalese nationality was very predictive with a score of 14 while Qatari nationality had a score of -2 and was not as highly predictive (Table 2A).
It was not possible to account for other factors, such as geography and comorbidities, as such data were not available. The study covered the duration of only one epidemic wave that was followed by a long low-incidence phase, lasting for seven months. Qatar is primarily a city state where infection was broadly distributed across the country's neighborhoods/areas; thus, geography is unlikely to have been a confounding factor. We conducted a sensitivity analysis where the multivariable logistic regression included a random effect at the PCR testing site, and the regression generated similar results to the baseline analysis (S2 Table). While data on comorbidities were not available, adjusting for age may have served as a proxy given the association between comorbidities and older age. Nonetheless, keeping the risk score as simple as possible and the number of variables to a minimum enhance its value to be implemented broadly as part of awareness campaigns and in primary care settings. Although the diagnostic metrics (sensitivity, specificity, and PPV) were not particularly high, the value of a risk score is not in providing a highly sensitive and specific measure for infection diagnosis. Infection diagnosis should only be done using biological testing such as by PCR. The value of a risk score is to optimize who to test by PCR resulting in reduced costs, consumption of resources, and logistics. The risk score enables identification of persons more likely to be infected shortly after infection, thereby reducing the potential severity of the infection through treatment and allowing faster isolation to reduce infection transmission.
Despite these limitations, the study had important strengths. The testing database was massive and encompassed all RT-PCR testing done in Qatar using validated commercial platforms with very high sensitivity and specificity. The database included results of over two million tests, representing a majority of the population of Qatar [8,42]. While adding other variables to the score may have improved its predictive power, it may have reduced its accessibility and utility for broad use as a tool of public health. The score value is in providing a non-invasive tool for identification of individuals at higher risk of being infected, who should be prioritized for PCR testing, in addition to typical cases of clinical suspicion and contact tracing. Therefore, use of such scores may substantially enhance the effectiveness of the "testing, tracing, and isolation" approach that is the "backbone" of the COVID-19 national response in different countries [2]. Indeed, the present analyses have helped to guide Qatar's national COVID-19 response to control transmission and to reduce the disease burden.
In conclusion, the concept and utility of a COVID-19 risk score was demonstrated in a single country. Policy makers should consider the application of this method in streamlining PCR testing to minimize costs, consumption of resources, and logistics. Such public health tool, based on a set of non-invasive and easily captured variables, can help optimize testing and suppression of infection transmission, while maximizing efficient use of available resources.
Supporting information S1 Table. Results of multivariable logistic regression analysis using only 10% of the sample to derive the original Qatar COVID-19 risk score. (DOCX) S2 Table. Results of multivariable random-effect logistic regression analysis (with random level at location of PCR testing) used to derive the original Qatar COVID-19 risk score.
Dr. Mariam Abdulmalik, CEO of the Primary Health Care Corporation and the Chairperson of the Tactical Community Command Group on COVID-19, as well as members of this committee, for providing support to the teams that worked on the field surveillance. We further thank Dr. Nahla Afifi, Director of Qatar Biobank (QBB), Ms. Tasneem Al-Hamad, Ms. Eiman Al-Khayat and the rest of the QBB team for their unwavering support in retrieving and analyzing samples and in compiling and generating databases for COVID-19 infection, as well as Dr. Asmaa Al-Thani, Chairperson of the Qatar Genome Programme Committee and Board Vice Chairperson of QBB, for her leadership of this effort. We also acknowledge the dedicated efforts of the Clinical Coding Team and the COVID-19 Mortality Review Team, both at Hamad Medical Corporation, and the Surveillance Team at the Ministry of Public Health.